Mass Spectrometry–Based Proteogenomics: New Therapeutic Opportunities for Precision Medicine

Proteogenomics refers to the integration of comprehensive genomic, transcriptomic, and proteomic measurements from the same samples with the goal of fully understanding the regulatory processes converting genotypes to phenotypes, often with an emphasis on gaining a deeper understanding of disease processes. Although specific genetic mutations have long been known to drive the development of multiple cancers, gene mutations alone do not always predict prognosis or response to targeted therapy. The benefit of proteogenomics research is that information obtained from proteins and their corresponding pathways provides insight into therapeutic targets that can complement genomic information by providing an additional dimension regarding the underlying mechanisms and pathophysiology of tumors. This review describes the novel insights into tumor biology and drug resistance derived from proteogenomic analysis while highlighting the clinical potential of proteogenomic observations and advances in technique and analysis tools.


INTRODUCTION
As the molecular genetics revolution of the 1990s brought us a more detailed and mechanistic view underpinning malignancies, the hope of moving from empiric therapies to more targeted approaches by focusing on specific driver mutations began to flourish and motivate multiple clinical trials.While early results in the 1990s were mostly disappointing, the emergence of imatinib as a successful targeted therapy for chronic myeloid leukemia highlighted the importance of identifying the appropriate clinical subpopulation for precision medicine (1,2).It could be argued that the success of imatinib in early clinical trials hinged on the availability of an easily detected molecular marker, the Philadelphia chromosome, for the identification of patients with the signature BCR-ABL1 fusion (1).In a similar fashion, the selection of HER2-amplified breast cancer patients for trastuzumab therapy and estrogen receptor (ER)-positive patients for tamoxifen therapy represented an extension of the concept to solid tumors (3).In the two decades since these foundational applications, the use of genomics and/or transcriptomics to select the most appropriate targeted therapy has ushered in the era of precision oncology.
Early attempts at implementing precision oncology were bolstered by the efforts of The Cancer Genome Atlas (TCGA) and other large consortia focused on cataloging somatic mutations, copy number variations (CNAs), DNA methylation, and comprehensive transcriptomic analyses of specific tumor types (4).Transcriptomic data were mined to produce prognostic signatures associated with outcome as a step toward customizing the intensity of therapeutic interventions to the individual's likelihood of progression (5).Early examples include the use of MammaPrint and OncoType DX to stratify breast cancer patients for aggressive therapy.More recently, these nucleic acid-based analyses have been extended to include comprehensive analyses of proteins and posttranslational modifications (PTMs) of tumors to create the new field of proteogenomics, which integrates protein-level measurements alongside genomics and transcriptomics, represented effectively by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and the International Cancer Proteogenome Consortium (ICPC).Beginning in the 2010s, CPTAC investigators provided in-depth proteogenomic characterization of selected tumor types that had been either previously analyzed by TCGA or prospectively collected for proteomic analyses (Table 1).These proteogenomic studies have enabled the tracking of information flow within tumors, including the identification of pathway-level changes associated with clinical outcomes.In this review, we summarize the results of these comprehensive proteogenomic analyses, provide new insights resulting from these studies, and highlight their existing and future clinical applications.We end with a discussion of current challenges limiting the translational benefits of proteogenomics and the technological advancements under development to address those challenges. of over 200 publications, that demonstrate the additional dimensions of cancer biology revealed by proteomic and phosphoproteomic analysis of solid and liquid tumors (Figure 1, Table 1).Novel insights from proteomics and phosphoproteomics that have potential therapeutic implications are highlighted.Studies focused on mapping the proteogenomic landscape of colorectal (7), breast (8), and ovarian (9) cancers were among the early efforts that enabled integration and aggregation of proteomic and phosphoproteomic analyses with corresponding genomic and transcriptomic data sets.
Zhang et al. (7) performed proteomics on 95 colorectal cancer (CRC) patient samples that were previously annotated through the TCGA.Integration of CNAs with transcriptomic and proteomic analyses uncovered many hot spots with potential driver alterations that could drive pheno-typic perturbations.Among these, the 20q amplicon was associated with the largest global changes at the messenger RNA (mRNA) and protein levels, revealing key driver genes such as HNF4A, SRC, and TOMM34 that were previously underappreciated in CRC pathogenesis (7).Proteogenomic profiling of 110 prospective CRC samples led to the discovery of additional biomarkers, Rb phosphorylation, and dependence on glycolysis, which promote tumorigenesis (10).
Proteogenomic characterization of 77 patients with breast cancer provided a more complete picture of how previously TCGA-annotated CNAs (5q) and mutations (TP53, PIK3CA) manifest at the protein level (8).In addition to ERBB2, phosphoproteomics identified other highly phosphorylated kinases-cyclin-dependent kinase (CDK)12, PAK1, PTK2, RIPK2, and TLK2-that also contribute to the luminal tumor phenotype.Proteomics captured key protein pathways to distinguish breast cancer subtypes (particularly the stromal subtype) that are not reflected at the mRNA level (8).A more recent study integrated multiomics data from 122 treatment-naïve primary breast cancer samples (11).Deconvolution of mRNA and proteomic signatures indicated that a subset of luminal breast cancers had an overexpression of immune checkpoint and STAT1/IFNG genes, suggesting that there is potential for the use of immunotherapy within this setting.The authors also showed that Rb protein status correlated with response to CDK4/6 inhibition (11).These studies demonstrate that measuring the proteome is essential in overcoming the bottleneck that limits the translation of genomics to therapeutic strategies.Proteomic characterization of 174 ovarian high-grade serous carcinoma (HGSC) patient samples, which had paired TCGA genomic analyses, demonstrated that proteomic utility is maximized when combined with genomics (9).One of the hallmarks of HGSC is chromosomal instability, as revealed by extensive CNAs.Proteins enriched in cell motility, invasion, and immune regulation were associated with CNAs and could be used to predict and stratify overall patient survival (9).A subsequent study from our laboratory extended proteogenomic analysis on 83 patient samples, implicating the activation of mitotic kinases and replicative stress as markers of ovarian HGSC (12).Proteogenomic profiling of 103 clear cell renal cell carcinoma patient samples revised the current tumor classification to include immune-based subtyping, information that could not be gleaned from transcriptomics alone (13).Upregulation of proteomic signatures associated with hypoxia, glycolysis, epithelial-mesenchymal transition (EMT), and inflammation was observed alongside a stark downregulation in oxidative phosphorylation.
CPTAC studied 95 endometrial carcinomas at genomic, transcriptomic, and proteomic levels and found distinct protein signatures associated with histologic subtypes (14).They found a novel regulation of EMT by QK1, circular RNA, and ESRP2, which was associated with progressive disease.Although higher tumor mutation burden (TMB) is associated with better response in many tumors, their analysis found low levels of antigen-processing machinery (APM) in some TMB-high tumors, potentially limiting the efficacy of immunotherapy and suggesting that APM should be considered in future clinical trials.
Recent advances with targeted inhibitors and immunotherapy have begun to improve survival for lung adenocarcinoma (LUAD).CPTAC investigators studied 110 paired LUAD tumors with normal adjacent tissue using multiomics (15).Phosphoproteomics identified targetable kinases for combination therapy: SOS1 in KRAS-mutant LUAD and PTPN11 in ALKand EGFR-mutant LUAD, the latter of which is being tested in clinical trials.Immunotherapy markers were explored as well, and an association of STK11 mutations with immune-cold tumors was noted.Unlike LUAD, lung squamous cell carcinoma (LSCC) has not benefitted from targeted therapy and remains difficult to treat.Proteogenomic characterization of 108 LSCC samples revealed unique subtypes associated with EMT and phosphorylation signatures (16).Similar to CRC (10) and breast cancer (11), Rb protein amount and phosphorylation were suggested as markers for response to CDK4/6 inhibitors based upon proteomic data, and immune profiling revealed a spectrum of immune-cold to immune-hot tumors to potentially guide immunotherapy.
Glioblastoma multiforme (GBM) is the most aggressive brain malignancy and has a very high mortality.Ninety-nine treatment-naïve GBM tumors were analyzed by proteogenomics, metabolomics, and single-nuclei RNA sequencing (17).Phosphoproteomics identified increased activity of receptor tyrosine kinases (RTKs), protein tyrosine phosphatase nonreceptor type 1 (PTPN1), and phospholipase C gamma 1 (PLCG1) signaling hubs, suggesting a potential therapeutic option.GBM could also be characterized into different immune subtypes, which could potentially be used in selecting immunotherapy.In contrast to GBM, pediatric brain tumors are both more rare and diverse.A proteogenomic analysis of 218 pediatric brain tumors was used to identify unique and common features for this rare disease (18).In particular, proteomics and phosphoproteomics were able to identify striking similarities between subgroups of craniopharyngioma and low-grade glioma tumors with BRAF V600E mutations, highlighting a potential therapeutic approach.
Pancreatic ductal carcinoma (PDAC) is a lethal cancer and difficult to treat, due to both the difficulty in resecting the pancreas and the lack of response to chemotherapy.Multiomic analysis of 140 PDAC samples and corresponding healthy tissue samples identified distinct glycoprotein expression associated with some KRAS mutations but not others, revealing a complexity beyond the simple presence of KRAS mutations.There was also an association of reduced endothelial cells, increased vascular endothelial growth factor (VEGF), and hypoxia-inducible factor (HIF) in immune-cold PDAC tumors.
Head and neck squamous cell carcinoma (HNSCC) can be broadly classified into human papilloma virus (HPV) associated and HPV negative, with the latter having a much worse prognosis.Multiomic approaches were used to characterize 108 HPV-negative HNSCC tumors and matched normal adjacent tissue samples (19).Epidermal growth factor receptor (EGFR) is a known target in HNSCC, but amplification does not always predict response.Huang et al. (19) found that overexpression of EGFR ligands as measured by proteomics was more predictive of both EGFR activity and clinical response to EGFR antibodies such as cetuximab, which block ligand binding.With respect to immunotherapy, HPV-negative HNSCC had low levels of antigen presentation, leading to immune-cold tumors.Taken together, proteomic analyses have identified novel therapeutic targets, informed downstream validation experiments, and provided a detailed landscape for solid tumors.

Liquid Tumors
Acute myeloid leukemia (AML) is difficult to treat and has an overall five-year survival rate of less than 25% (20).Cytotoxic chemotherapy has remained the primary treatment for decades, with minimal improvement in patient outcomes.Recent attempts to create small-molecule kinase inhibitors, similar to imatinib (1,21), have met with limited success, owing to the heterogeneity of AML (22,23).
A number of studies have evaluated the genomic landscape of AML with respect to mutational and drug response profiling (22,(24)(25)(26).However, the underlying biology connecting genomic aberrations with drug response is not easily apparent; proteomic analyses have helped bridge this gap (27).Proteomic profiling has identified a novel Mito-AML (28) and age-dependent alterations contributing to chemoresistance (29).Posttranscriptionally regulated proteins have been identified in genetically defined subsets (e.g., KDM4 isoforms in IDH1/2 mutation-positive patients or elevated nuclear importins in NPM1-mutated AML patients) (30).These examples underscore the ability of proteomic data to provide novel mechanistic insights.
Kinase-substrate enrichment analysis (KSEA), which evaluates mass spectrometry (MS)based phosphoproteomic data and infers kinase and associated network activity, was originally developed using AML models (31).KSEA identified phosphatidyl inositol 3 kinase (PI3K), casein kinases, CDKs, and p21-activated kinases as the kinases that are most frequently enriched in AML.Similarly, clustering of differentiation marker expression has been used to infer remodeled AML kinase-signaling networks during differentiation, identifying increased activity of prosurvival pathways regulated by MAP2K1 and protein kinase C (PKC) (32).Based on a study of primary patient samples, a phospho-signature containing seven validated peptides was found to predict FLT3 inhibitor response in patients with 78% accuracy (33).Selected reaction monitoring (SRM) or immunological detection assays of peptide signatures would enable clinical testing of biopsies.In the future, predictive phospho-signatures could facilitate personalized drug selection or realtime monitoring to enable early detection of drug resistance.
MS-based proteomic approaches mapped the effects of targeted agents on AML cell lines and primary samples and identified key pathways affected by drug treatment (33)(34)(35) and mutational status (36) or targetable pathways associated with drug resistance (37,38).We previously used computational approaches to integrate proteomic profiling with genomics, transcriptomics, and small-molecule inhibitor sensitivity data sets to create models that recapitulate patient biology, which can be leveraged to prioritize treatment strategies (37,39).
Pluripotent self-renewing leukemic stem cells (LSCs) must be fully characterized to develop therapies that eradicate residual disease and achieve long-term remissions.MS analyses have revealed LSC-specific changes in oxidative phosphorylation, adhesion molecule composition, and RNA processing properties (40).Similar findings were uncovered from in-depth proteomic studies performed on 47 adults and 22 pediatric AML samples taken throughout disease progression (41).Mitochondrial ribosomal protein and subunits of the respiratory chain complex were enriched at relapse, suggesting a role for altered energy metabolism.
To uncover nongenetic mechanisms of multiple myeloma lenalidomide resistance, global tandem mass tag (TMT)-based proteomic and phosphoproteomic analyses were performed on paired pretreatment and relapsed samples.A CDK6-governed resistance signature was uncovered, which included high-risk factors such as thyroid hormone receptor interactor 13 (TRIP13) and ribonucleotide reductase catalytic subunit M1 (RRM1) and identified synergy between CDK inhibition and lenalidomide treatment (43).Proteomic analyses performed on liquid tumors have uncovered numerous classification and response signatures with translational value.Proteomic-based tests could be leveraged in the future to identify targeted therapies for individual patients, to monitor drug resistance, and to detect disease recurrence.

PROTEOMIC ANALYSIS OF BLOOD AND BODY FLUIDS
In clinical laboratory testing, blood or body fluids are most widely used to determine disease diagnosis and prognosis.One advantage of liquid malignancies is that blood samples can be easily collected in the clinic, providing viable cells for downstream proteomic applications.This is particularly useful for longitudinal analyses to understand response and resistance over the course of treatment.For example, liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to identify changes in global and phosphorylated proteins associated with AML relapse in 41 patients due to the availability of serial patient samples.This study demonstrated that relapse was associated with increased expression of RNA processing proteins and decreased expression of V-ATPase proteins.Further, there was an increase in phosphorylation events catalyzed by CDKs and casein kinase 2 (44,45).Since these pathways can be targeted therapeutically, this may inform clinical approaches to identify and target resistance pathways.
The ability to isolate various cell types from blood samples using flow cytometry also enables proteomic profiling at a subpopulation level.In a recent study, proteomic characterization was performed to identify specific vulnerabilities of CD34 + LSCs that can be leveraged therapeutically (40).Similar proteomic analyses were performed on subsets of monocytes, identifying over 5,000 proteins potentially associated with disease processes (46).While the analysis of blood samples is relatively straightforward, analyzing bone marrow biopsies from leukemia patients is still necessary to understand the full tumor ecosystem, which encompasses communication between the leukemic cells and neighboring cells of the marrow microenvironment.We have used pre-and on-treatment AML bone marrow samples to identify early biomarkers that can be harnessed to circumvent the development of FLT3 drug resistance (37).
While proteomic applications have been readily incorporated into studies of drug resistance within liquid tumors, this application remains a largely uncharted territory in solid tumors, at least partially due to the invasive nature of collecting a single biopsy or longitudinal samples.Despite these challenges, MS-based proteomic strategies have been applied to profile many solid tumors as discussed above (47).
More recently, proteomic approaches have also been developed to analyze extracellular vesicles (EVs) and body fluids for the presence of disease progression and drug resistance markers.Proteomic profiling of EVs isolated from Ewing sarcoma cell lines compared to healthy human plasma led to the discovery of the Ewing sarcoma-specific markers CD99 and nerve growth factor receptor (NGFR) (48).Parallel analyses of EVs in breast cancer identified subtype-specific biological processes and molecular pathways, including hyperphosphorylated receptors, kinases, and defined protein signatures that closely reflect the associated clinical pathophysiology (49).
Examination of body fluids such as urine, saliva, tears (50), and plasma via proteomics has similarly yielded the identification of disease biomarkers and precision oncology approaches (47).Cerebrospinal fluid (CSF) surrounds the brain and spinal cord, providing mechanical and immunological protection; however, it is not sampled as often as blood in central nervous system malignancies as its collection is significantly more invasive.In a recent study, 251 CSF samples from patients with four types of brain malignancies and healthy individuals were analyzed by proteomic analysis.By integrating CSF data with proteomic analyses of corresponding tumor tissue and primary glioblastoma cells, CSF biomarkers such as chitinase-3-like protein 1 and glial fibrillary acidic protein were identified (51).Despite challenges in sample accessibility and collection, proteomic characterization of a plethora of body fluids has fueled the discovery of new biomarkers and therapeutic targets.
Novel methods in single-cell MS-based profiling (52) and subcellular or EV proteomic evaluation (53)(54)(55)(56) are also uncovering new biology and pathways for therapeutic targeting.Computational workflows such as SCeptre (Single Cell proteomics readout of expression) have enabled the normalization of global single-cell MS data from up to 1,000 cells.These new single-cell techniques can be deployed on bulk or enriched populations of cells, enabling the exploration of leukemia cell heterogeneity (52).Proteomics performed on subcellular isolates from AML cells such as plasma membrane (54), the nucleus (53,55), and EVs (56) have enabled leukemia subclone tracking by identifying 50 leukemia-enriched plasma membrane proteins (54), helped to identify novel therapeutic targets such as the nuclear protein S100A4, and supported the evaluation of drug combinations such as the combination of the nuclear export inhibitor selinexor with the MDM2 inhibitor nutlin-3a or the AKT inhibitor MK-2206 (53,55).
Recently, several studies successfully harnessed the power of comparative spatial proteomics as a discovery tool to unravel disease mechanisms in solid tumors.While spatial proteomics approaches are difficult to implement in liquid tumors, this approach can be used for functional identification of rare tumors, tumor-infiltrating immune cells, and dissection of cellular mechanisms (57).As an example, integration of spatial proteomics with traditional phosphoproteomics uncovered that fibroblast growth factor receptor 2 beta (FGFR2b) stimulated by its ligand fibroblast growth factor 10 (FGF10) activates mammalian target of rapamycin (mTOR)-dependent signaling and ULK1 in recycling endosomes, resulting in suppression of autophagy and cell survival (58).Going forward, the improvement of singlecell proteomics along with spatial proteomics can significantly deepen our understanding of disease biology, particularly when combined with drug response and clinical outcome data.
After the initial proof-of-concept proteogenomic studies (7,62), simultaneous analysis of proteins and phosphorylation became the baseline to gain critical insights into signaling network activities in cancer.In addition to increasing throughput, isobaric labeling also enables integrated protein and PTM analysis (i.e., from exactly the same sample) while significantly reducing the sample input requirement for individual samples.The latter has important implications for the effective utilization of size-limited clinical samples because comprehensive analysis of PTMs, generally present at substoichiometric levels, requires enrichment from large amounts of sample.More PTMs can be added to a basic integrated workflow (74) where 5% of TMT-labeled peptides are used for analysis of unmodified peptides and 95% of peptides are subjected to phosphopeptide enrichment using immobilized metal affinity chromatography (IMAC).For example, the enrichment of ubiquityl peptides (75) and tyrosine-phosphorylated peptides (76) using antibody-based methods can be added before IMAC; acetyl peptides (11,14,16,17) and glycosylated peptides (60) can be enriched from IMAC flow-through using antibody-and chromatography-based methods, respectively.Additionally, analysis of the immunopeptidome (77) may be added to the beginning of this integrated workflow for cancer types harboring significant mutations.Discoveries made on protein and PTM abundance changes can be confirmed in additional cohorts using targeted proteomics methods (78) such as SRM (79) and parallel reaction monitoring (80)(81)(82).
The proteomics workflows mentioned above enable deep protein and PTM analysis; however, they necessitate bulk sample processing, which precludes investigating the role of different cell populations and tumor heterogeneity in disease (83,84).Recent technological improvements have enabled extension of MS-based proteomics to spatially resolved (85-93), cell type-resolved (91, 94, 95), and single-cell (52, 96-100) measurements.Cell types or regions of interest can be isolated using cell sorting (101-103) or laser capture microdissection (104, 105) and collected into microwell plates for further processing or coupled directly to recently developed chip-based platforms (106)(107)(108).The samples are then prepared with optimized protocols aimed at minimizing sample losses to surfaces and optimizing digestion kinetics at low sample concentrations, for example, nanoliter droplet processing (109,110), advanced microfluidic devices (111), and microplate approaches (99, 112).
After preparation, samples can be analyzed by either label-free quantification (LFQ) or isobaric labeling quantification approaches.For isobaric labeling, employing a carrier approach, which involves leveraging a TMT channel with significantly higher loading of peptides with similar composition to the study samples, has greatly improved sensitivity (100, 113), albeit at the cost of deteriorated quantitation (114)(115)(116).LFQ provides more accurate quantification; however, without sample multiplexing, less material is available for analysis, which requires further workflow customization (98, 99, 117, 118).The maturation of DIA algorithms (119)(120)(121) and analysis pipelines has resulted in significant improvements in peptide identification efficiency and reduced missingness even when the signal is limited (111,122).Another emerging technology that promises to significantly improve LFQ sensitivity is the integration of ion mobility separations between the LC and the MS (99, [123][124][125]. Currently, applications in the spatial and single-cell domains have been largely limited to global proteomics measurements, but efforts aimed at the miniaturization of PTM enrichment are well underway (126,127).Perhaps even more exciting, advances in nanodroplet processing platforms when combined with ion mobility have produced the first demonstration of proteomics and transcriptomics from the same single cell, opening the possibility of single-cell proteogenomic measurements (128).

CHALLENGES
The integration and interpretation of proteogenomic measurements, and ultimately, the test of their value, comes from the development of novel computational approaches.Data integration is first challenged by the fact that data exist on varying scales-genetic mutations are often assigned as discrete types of calls depending on the type of change in the DNA sequence (129,130), while transcript measurements represent absolute changes in the mRNA relative to the length of the transcript and the depth of sequencing (131), and protein measurements are log ratio values representing the amount of sample measured relative to a standard control (132).Methods to overcome these challenges depend on the type of analysis at hand: Namely, nonnegative matrix factorization helps identify clusters of samples that behave similarly across scales (133)(134)(135) using all types of omic data, differential expression analyses can be performed between experimental conditions, and overlap between those features compared.Even after normalization, omic measurements do not agree as often as expected, as genetic mutations can fail to confer changes in expression (136,137), changes in RNA expression may not result in actual protein changes (137,138), and PTMs can be altered without changes in protein levels (139).As such, it is necessary to evaluate all omics measurements in an integrated fashion.These integrated approaches map omics measurements or changes to published data by mapping changes directly to the transcriptomic (140), proteomic (141), or phosphoproteomic networks (142,143) or by comparing changes to lists of genes that represent pathways (144) or signatures of response.These approaches have enabled the study of proteogenomics in cancer to identify findings that are greater than the sum of their parts-an integrative and aggregative approach.Despite these advancements in computational analysis tools, proteomics introduces a unique computational challenge to precision medicine.Most precision medicine-based approaches rely on large patient cohorts to identify mutated genes that signify a change in prognosis or treatment response.In gene expression studies, this approach has expanded beyond single genes to identify signatures or groups of transcripts that can be used to infer patient response (145,146).In proteomics, however, identification of signatures or biomarkers is stymied by (a) the relative quantitative nature of MS, requiring shared reference samples (147,148); (b) increased difficulty of detection for some proteins/peptides, resulting in potential biomarkers being missed (149); and (c) diversity in sample processing that causes large batch effects between downstream analyses.Experimental techniques such as sample pooling, common reference samples, and MS undersampling can lessen the impact of missingness in these data sets caused by absent peptides/proteins (150)(151)(152), but many bioinformatic challenges remain due to the numerous steps required for data processing and optimization.
Batch effects in any high-throughput computational workflow stem from the numerous steps in the data analysis pipeline, each of which can be done by a handful of tools that each give different results.These steps, summarized in Figure 3, include (a) peak selection (153), (b) searching databases for peptide matches (154,155), (c) mapping peptides to proteins, (d) filtering for false discovery, and (e) imputation of missing data (156)(157)(158)(159).Since each step can be applied with different parameters, or with entirely different databases, the pooling of data across patient cohorts needed for precision medicine requires accurate accounting of the tools and data utilized.
Many tools have been developed to enable provenance across methodological variables.These tools break down into five categories, independently colored in Figure 3, and include (a) standardized data repositories with open application programming interfaces for data retrieval such as PRIDE (160), ProteomeXchange (161), Figshare (https://figshare.com/), and Synapse (https://www.synapse.org/);(b) open-source code repositories such as GitHub, GitLab, or Bit-bucket that enable sharing of methods; (c) continuous integration tools that automate container building and check for quality; (d) container registries that store versioned images to run the tools such as Docker Hub and BioContainers; and (e) scientific workflow repositories that enable storage of the precise steps that run the necessary tools in the necessary order using languages such as the common workflow language (162), Nextflow (163), and the workflow description language (163).With these tools in place, scientists need only the standardized parameter files and the workflow language/container tools installed on their machine to run analyses.Proteomics analysis frameworks have only scratched the surface of scientific workflow development (164,165), but as these methods become more popular, larger cohorts can be harmonized for precision medicine analyses (166).

CONCLUSIONS AND FUTURE DIRECTIONS
MS analyses of primary tumors have clearly been instrumental in major discoveries that have impacted our understanding of cancer biology, diagnostic precision, prognostication, and development of new therapeutic strategies (Figure 1, Table 1).Proteogenomics data have identified features of tumors that were undiscernible through genetic or transcriptomic analyses, and integration of all data types has led to additional discoveries, following the analytical process outlined in Figure 4. Key biological insights that have been consistently observed following the application of proteogenomics to multiple tumor types include (a) the addition of phosphoproteomics, which provided a more detailed characterization of downstream signaling pathways beyond the driving mutation, identifying potential alternative therapeutic targets (8,11,14,15,17,19,36,37); (b) stratification of tumors into immune-hot and immune-cold subtypes, which provided insights into factors that potentially modify the response to immunotherapy (16,17,43,60,63); and (c) identification of consistently discordant mRNA-protein pairs, which implicated translational regulation and protein degradation as important components of cancer biology (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)167).
To further evolve the field of proteogenomics, more advances will be needed in several key areas.

Scaling Down: Miniaturizing Inputs
It will be important to harness and develop new technologies that facilitate testing of smaller amounts of input material (168).This is critical for several important reasons.First, primary tumors are often difficult to obtain and material is limited, especially in disease relapse stages for which there is a critical dearth of knowledge.Second, tumors are highly heterogeneous, and it is important to study each cellular component.Fractionation of these cell subpopulations often yields scarce numbers of cells for analysis.Finally, the capacity to study biology at a single-cell or near-single-cell level is pushing new frontiers in nucleic acid sequencing and promises to do the same for proteomics (including PTMs) (126,169,170) and/or metabolomics (171)(172)(173).

Scaling Up: Multiplexing
As noted above, the capacity for multiplexing samples has and continues to improve (71,(174)(175)(176).This has come with great benefits for reducing costs and minimizing batch effects.In addition to the intratumoral heterogeneity noted above, there is also extreme heterogeneity across patients.As such, it is critical to develop multiplexing techniques and economies of scale to enable proteomic analysis of larger numbers of patient tumors.In this way, identification of proteomic patterns of tumor groups with less common clinical or genetic profiles, which collectively encompass large proportions of tumors, will be enabled.In addition, the breadth of PTM analytes that are measured by MS has increased dramatically (77,177,178), and it will be important to continue this trajectory to better understand the biology and clinical ramifications of these important protein modifications.

Dynamic Measurements
Nearly all proteomic data collection to date has been performed at static, baseline conditions.While this is a critical first step toward establishing the proteomic landscape of each tumor, it is also clear that tumors respond to therapeutic stress in diverse ways.Hence, it will only be through longitudinal and dose-dependent testing of dynamic changes that occur after patients have been treated with therapeutic regimens that we will be able to fully harness proteogenomic data for diagnostic, prognostic, and therapeutic deliverables (179).It may be possible to obtain some of this dynamic information through short-term, ex vivo exposure of primary tumor specimens to panels of agents, which will offer information on larger numbers of potential therapeutics than would be clinically feasible.The increasing sensitivity and throughput of MS proteomics approaches discussed above will be critical to accomplishing this goal.

Translating the Work
The proteogenomic analysis of diverse tumor types has yielded a wealth of findings, many of which point to potential new therapeutic strategies and/or new mechanistic insights that should be further pursued.Proteogenomics itself may also offer new diagnostic platforms to help guide when and where therapies are deployed.Continued progress in these areas will be essential for the ultimate fulfillment of the translational promise of proteogenomics.
Collectively, a great deal has been accomplished and learned through proteogenomic analyses of primary tumors.Through continued efforts along the same lines as well as expansion into the areas noted above, the future looks bright for proteogenomics to continue having a major impact on our knowledge of tumor biology and clinical care of patients with cancer.Chalk talk highlighting the multiomics approach our group has taken to reveal the underlying biology of solid and liquid tumors.New mechanistic insights and therapeutic targets have emerged from our global and phosphoproteomic profiling of ovarian carcinoma (12) and AML (37).Abbreviation: AML, acute myeloid leukemia.Reproducibility is the coefficient of variance of replicate analyses.Ease of implementation indicates how accessible the methodology is to general practitioners.Sample throughput is the number of patient samples that can be analyzed per unit time.Input requirement is defined as the amount of specimen needed for analysis with single-cell methods having the smallest sample requirement.

Figure 2 .
Figure 2.Radar plots comparing the analytical figures of merit for proteomics modes.(a) Targeted approaches include selected reaction monitoring (SRM) and internal standard triggeredparallel reaction monitoring (IS-PRM).(b) Discovery/global approaches include tandem mass tag with serial posttranslational modification enrichment (TMT-PTM) and dataindependent acquisition (DIA).(c) Spatial and single-cell approaches.Protein coverage refers to the number of proteins that can be quantified in an experiment.Dynamic range is defined as the concentration range of proteins that can be accurately quantified.Reproducibility is the coefficient of variance of replicate analyses.Ease of implementation indicates how accessible the methodology is to general practitioners.Sample throughput is the number of patient samples that can be analyzed per unit time.Input requirement is defined as the amount of specimen needed for analysis with single-cell methods having the smallest sample requirement.

Figure 3 .
Figure 3.Summary of scientific workflow tools to enable precision medicine proteomic analyses.(Top) Standardized databases enable storage of data in machine-readable formats.(Middle) Public repositories, continuous integration, and container registries enable tool developers for each of the five steps (from left to right) of analysis to create state-of-the-art tools and also maintain all versions.(Bottom) Scientific workflow languages link tools in sequence as needed by scientists, who provide standardized parameter files to reproduce analysis uniformly across large cohorts.Abbreviations: API, Application Programming Interface; FDR, false discovery rate.

Figure 4 .
Figure 4. Leveraging proteogenomics in precision medicine.(Top) Sample procurement requires proper processing of clinical samples.(Right) Selection of mass spectrometry technology requires balancing trade-offs from each technology.(Left) Mapping proteogenomic measurements to clinical outcomes requires assembling diverse bioinformatic tools.